Skip to main content
  1. Writing/

Extracting Match Stats From Halo Infinite Film Files

·4902 words
This blog post is part of a series on exploring the Halo game API.

Introduction #

One of the conversations on my blog comments led to a discussion about film files in Halo Infinite. In case you are not familiar with them, no worries - it’s a pretty obscure component of the match data that I haven’t gone in-depth yet on my blog or here, on the OpenSpartan blog.

The idea behind film files is simple - they aren’t your traditional video but rather a combination of game engine metadata that is captured during your gameplay. When you complete a match, a “film” (a recording of all match metadata) is captured and you end up with a whole bunch of binary content that is available through a dedicated API endpoint.

Before we go down this rabbit hole, I want to give a massive shout-out to Andy Curtis for doing quite a bit of work digging through film file structure 🙌

Finding the film files #

Before we get to the film content, let’s figure out how we find them. To get started, first try to get your own matches from the Halo Infinite API. This will allow you get the match IDs that we can later use to query for film data. You can send a request to this endpoint to get the most recent matches:

https://halostats.svc.halowaypoint.com/hi/players/xuid({{XUID}})/matches?count=25

In the example above, {{XUID}} is the numeric identifier of your player ID. I talked about the process of converting a gamertag into a XUID in a separate blog post.

You will need to make sure that you authenticate for the API call above to succeed (and all other API calls in this blog post). You can learn more about this in Halo Infinite Web API Authentication.

The match data you will get will be by default in JSON format, like this:

{
    "Start": 0,
    "Count": 25,
    "ResultCount": 25,
    "Results": [
        {
            "MatchId": "4fb89c93-53e1-4d7e-b273-5f4c4c1a58e4",
            "MatchInfo": {
                "StartTime": "2024-09-16T02:35:15.505Z",
                "EndTime": "2024-09-16T02:42:08.144Z",
                "Duration": "PT6M31.0705518S",
                "LifecycleMode": 3,
                "GameVariantCategory": 9,
                "LevelId": "1216247c-bf6d-4740-8270-e800a114f231",
                "MapVariant": {
                    "AssetKind": 2,
                    "AssetId": "37a9b5f0-6be7-4a46-8010-1fe6f7ea5611",
                    "VersionId": "e1cbf812-4f4e-44fc-9ef8-dd9ab5c4e4cf"
                },
                "UgcGameVariant": {
                    "AssetKind": 6,
                    "AssetId": "0e198591-ac15-4f99-8ff2-dd390decad66",
                    "VersionId": "168e6c3a-fdf3-4edd-af79-c0ffe5475026"
                },
                "ClearanceId": "bb31018c-8ca3-4673-b870-5193cfdf18f5",
                "Playlist": {
                    "AssetKind": 3,
                    "AssetId": "1b1691dc-d8b9-4b1f-825d-cb1c065184c1",
                    "VersionId": "38ecf0d8-82ca-4831-b186-eda51653f2ba"
                },
                "PlaylistExperience": 2,
                "PlaylistMapModePair": {
                    "AssetKind": 7,
                    "AssetId": "6b7c20a9-5eed-476f-9716-6d20e2f37f1a",
                    "VersionId": "56c4ba81-a659-4168-bc02-8f4135e693f9"
                },
                "SeasonId": "Csr/Seasons/CsrSeason8-1.json",
                "PlayableDuration": "PT6M31.063S",
                "TeamsEnabled": true,
                "TeamScoringEnabled": true,
                "GameplayInteraction": 1
            },
            "LastTeamId": 1,
            "Outcome": 2,
            "Rank": 1,
            "PresentAtEndOfMatch": true
        },
        [...MORE MATCH DATA...]
      ]
}

This is all useful metadata, but we are looking specifically for the match ID captured in the MatchId property. In my case, the match I am looking for is 4fb89c93-53e1-4d7e-b273-5f4c4c1a58e4, which is a recent Husky Raid game I’ve been a part of.

With the match ID in hand, we can now request the film chunks (every film has several “chunks” that are just binary data) by constructing the URL for another API endpoint, like this:

https://discovery-infiniteugc.svc.halowaypoint.com
  /hi
  /films
  /matches
  /4fb89c93-53e1-4d7e-b273-5f4c4c1a58e4
  /spectate

If the call succeeds, the metadata you will get will look like this:

{
    "FilmStatusBond": 1,
    "CustomData": {
        "FilmLength": 403190,
        "Chunks": [
            {
                "Index": 0,
                "ChunkStartTimeOffsetMilliseconds": 0,
                "DurationMilliseconds": 11,
                "ChunkSize": 465309,
                "FileRelativePath": "/filmChunk0",
                "ChunkType": 1
            },
            {
                "Index": 1,
                "ChunkStartTimeOffsetMilliseconds": 0,
                "DurationMilliseconds": 19972,
                "ChunkSize": 47858,
                "FileRelativePath": "/filmChunk1",
                "ChunkType": 2
            },
            {
                "Index": 2,
                "ChunkStartTimeOffsetMilliseconds": 19973,
                "DurationMilliseconds": 20003,
                "ChunkSize": 122480,
                "FileRelativePath": "/filmChunk2",
                "ChunkType": 2
            },
            [...MORE CHUNKS...]
        ],
        "HasGameEnded": true,
        "ManifestRefreshSeconds": 30,
        "MatchId": "4fb89c93-53e1-4d7e-b273-5f4c4c1a58e4",
        "FilmMajorVersion": 37
    },
    "BlobStoragePathPrefix": "https://blobs-infiniteugc.svc.halowaypoint.com/ugcstorage/film/1c7442bd-1f8d-4593-b7d0-1c95618c6876/e6796b9c-eb98-4c32-879a-5e5ab3d567f1/",
    "AssetId": "1c7442bd-1f8d-4593-b7d0-1c95618c6876"
}

The way Halo Infinite API handles films is by splitting them up into separate chunks that contain different classes of in-game metadata during different parts of the game. You will see those chunks yourself when you are in theater mode - the timeline is clearly split into them (see the black markers):

Film fragments in Theater mode in Halo Infinite.
Film fragments in Theater mode in Halo Infinite.

Film chunks are player-independent - they are recorded for the match itself and contain metadata about all players in them. To get the content of each chunk we will construct the URL based on the BlobStoragePathPrefix property and the FileRelativePath for each chunk:

https://blobs-infiniteugc.svc.halowaypoint.com
  /ugcstorage
  /film
  /1c7442bd-1f8d-4593-b7d0-1c95618c6876
  /e6796b9c-eb98-4c32-879a-5e5ab3d567f1
  /filmChunk0

While this is not explicitly called out, the first GUID is the film asset ID and the second is the film asset version, similar to how game asset metadata is associated in the game CMS. If you have film IDs, you can get those directly without worrying about getting match IDs first.

With the URLs ready, we can now download every single chunk for a match and analyze them. If you are on Linux (or using Windows Subsystem for Linux) you can use this Bash script to quickly download all film chunks for a match (make sure to replace your token and clearance):

#!/bin/bash

# Check if match ID is provided
if [ "$#" -ne 1 ]; then
    echo "Usage: $0 <MATCH_ID>"
    exit 1
fi

MATCH_ID=$1

# Headers for the API request
AUTH_HEADER="x-343-authorization-spartan: v4=YOUR_TOKEN"
CLEARANCE_HEADER="343-clearance: CURRENT_CLEARANCE"
LANGUAGE_HEADER="Accept-Language: en-us"
ACCEPT_HEADER="accept: application/json"

echo "Fetching chunk information for match: ${MATCH_ID}..."
RESPONSE=$(curl --silent --location --request GET "https://discovery-infiniteugc.svc.halowaypoint.com/hi/films/matches/${MATCH_ID}/spectate" \
    --header "${AUTH_HEADER}" \
    --header "${CLEARANCE_HEADER}" \
    --header "${LANGUAGE_HEADER}" \
    --header "${ACCEPT_HEADER}" \
    -w "%{http_code}" -o response.json)

HTTP_STATUS="${RESPONSE}"
echo $HTTP_STATUS

# Check for successful response
if [[ "$HTTP_STATUS" != "200" ]]; then
    echo "Error fetching data: HTTP status $HTTP_STATUS"
    exit 1
fi

# Extract the base URL and film chunk paths
BASE_URL=$(jq -r '.BlobStoragePathPrefix' response.json)
CHUNK_PATHS=$(jq -r '.CustomData.Chunks[].FileRelativePath' response.json | sed 's|^/||')  # Remove leading slashes

# Clean up response file
rm response.json

# Loop through each chunk and download it
for CHUNK_PATH in $CHUNK_PATHS; do
    # Construct the full URL
    FULL_URL="${BASE_URL}${CHUNK_PATH}"
    COMPRESSED_FILE="compressed${CHUNK_PATH##*/}"
    DECOMPRESSED_FILE="DECOMPRESSED_${CHUNK_PATH##*/}"

    # Download the compressed chunk
    echo "Downloading chunk from ${FULL_URL}..."
    curl --location --request GET "${FULL_URL}" \
        --header "${AUTH_HEADER}" \
        --header "${CLEARANCE_HEADER}" \
        --header "${LANGUAGE_HEADER}" \
        --header "${ACCEPT_HEADER}" \
        --output "${COMPRESSED_FILE}"

    # Decompress the chunk
    echo "Decompressing ${COMPRESSED_FILE}..."
    python3 -c "import zlib, sys; sys.stdout.buffer.write(zlib.decompress(sys.stdin.buffer.read()))" < "${COMPRESSED_FILE}" > "${DECOMPRESSED_FILE}.bin"

    # Clean up compressed file
    rm "${COMPRESSED_FILE}"
    echo "Decompressed chunk saved as ${DECOMPRESSED_FILE}."
done

echo "All chunks downloaded and decompressed!"

You can make the script executable with chmod +x yourscript.sh and then run it by passing the match GUID as the first argument:

./yourscript.sh 1C5F57D3-1418-4BDE-A970-F8FAB6DFE110

This script helpfully decompresses the chunks as well, but we’ll get to that a bit later in this post.

As you look at the metadata for each chunk you will notice that individual chunks have a type. From what I can infer, they break down like this:

Chunk type Description
1 Game bootstrap metadata
2 In-game event captures
3 Game summary metadata

We’ll be using every single one of them in our explorations.

Dissecting chunk metadata #

Looking at existing chunks, we see that the ones that have the type of 1 or 2 have very sparse event data, at least on the surface. However, they contain valuable information that we will need. To explore the content, let’s download a random chunk for an existing match:

https://blobs-infiniteugc.svc.halowaypoint.com/ugcstorage
  /film
  /1c7442bd-1f8d-4593-b7d0-1c95618c6876
  /e6796b9c-eb98-4c32-879a-5e5ab3d567f1
  /filmChunk3

Opening it in a hex editor produces this result:

Binary content for a Halo Infinite film chunk.
Binary content for a Halo Infinite film chunk.

Not exactly “human-readable”, and that’s because we’re missing a core step here - decompression. The clue for that are the first two bytes of the chunk file 78 5E, which is an indicator of zlib Fast Compression. You can read more about it in the official RFC. Looks like we’re dealing with compressed data, and therefore need to make sure that we “extract” it before attempting to read the data.

Let’s do this a bit differently then - we’re going to download the binary file with cURL and then decompress it with Python. Assuming that you are not already using the script I shared earlier to download every chunk, our first step is this:

curl --location --request GET 'https://blobs-infiniteugc.svc.halowaypoint.com/ugcstorage/film/1c7442bd-1f8d-4593-b7d0-1c95618c6876/e6796b9c-eb98-4c32-879a-5e5ab3d567f1/filmChunk3' --header 'x-343-authorization-spartan: v4=YOUR_AUTH_HEADER' --header '343-clearance: YOUR_CLEARANCE' --header 'Accept-Language: en-us' --header 'accept: application/json' --output chunk-compressed.bin

And then, we can run a bit of inline Python magic to decompress the content we just downloaded into its own file - decompressed_output.bin:

python3 -c "import zlib, sys; sys.stdout.buffer.write(zlib.decompress(sys.stdin.buffer.read()))" < chunk-compressed.bin > decompressed_output.bin
Uncompressed binary content for a Halo Infinite film chunk.
Uncompressed binary content for a Halo Infinite film chunk.

This looks a bit more promising because we actually see repeating patterns. It’s even more promising if we look up events inside the chunk by the XUID for a given player that existed in a match. Because I am using a hex editor, I can easily look up the UInt64 value (all XUIDs are unsigned 64-bit integers), leading me to this:

7:B1E0h  00 00 00 00 00 00 00 00 00 00 00 00 5A 00 65 00  ............Z.e. 
7:B1F0h  42 00 6F 00 6E 00 64 00 00 00 00 00 00 00 00 00  B.o.n.d......... 
7:B200h  00 00 00 00 00 00 00 00 00 00 00 00 00 E5 DE DE  .............åÞÞ 
7:B210h  03 00 00 09 00 2D C0 00 00 00 04 58 00 00 00 00  .....-À....X.... 

Because Halo Infinite is generally known to use quite a bit of Bond-encoded data, I wanted to pass the content of the file through my tool - bond-reader. Doing that was fruitless, though, as it turned out that the data is not Bond-formatted (at least not that I could tell from some short-term digging). I guess we’ll have to stick with proper inference of binary data based on vanilla binary pattern analysis.

Another wrench thrown into our plans was also detected by Andy Curtis the fact that data is not necessarily byte-aligned in the film chunks. That is - if you use a hex editor to spot all existing patterns you might find some but there is quite a bit of data “hiding” in plain sight because it just isn’t properly positioned for a hex editor to render it.

Decoding unaligned data #

Because we can’t count on just our hex editor to find the data, we can write some custom code to find the things we want that are not aligned with our expectations 😎

To do that, here is a complete C# application that does just that - if you give it a byte pattern to search for (disregard the actual example pattern - it’s just a demo), it will try to find it regardless of how the data is actually aligned in the file:

namespace ComponentSearchByteAlign
{
    internal class Program
    {
        public static void Main(string[] args)
        {
            byte[] data = File.ReadAllBytes(@"PATH_TO_YOUR_DECOMPRESSED_BIN_FILE");

            // This can be a XUID or a gamertag to easily spot the data sequences
            byte[] pattern = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF };

            List<int> matchPositions = FindPattern(data, pattern);

            if (matchPositions.Count > 0)
            {
                Console.WriteLine($"Pattern found at bit positions ({matchPositions.Count} total):");
                foreach (int position in matchPositions)
                {
                    Console.WriteLine(position);
                }
            }
            else
            {
                Console.WriteLine("Pattern not found.");
            }
        }

        public static List<int> FindPattern(byte[] data, byte[] pattern)
        {
            List<int> matchPositions = [];
            int dataBitLength = data.Length * 8;
            int patternBitLength = pattern.Length * 8;

            for (int bitPos = 0; bitPos <= dataBitLength - patternBitLength; bitPos++)
            {
                if (IsBitMatch(data, pattern, bitPos))
                {
                    matchPositions.Add(bitPos);
                }
            }
            return matchPositions;
        }

        public static bool IsBitMatch(byte[] data, byte[] pattern, int bitOffset)
        {
            // Calculates the number of whole bytes to skip.
            // We divide bitOffset by 8 because there are 8 bits per byte.
            int byteOffset = bitOffset / 8;

            // Calculates how far into the byte (number of bits) we need to start.
            // It's the remainder when bitOffset is divided by 8, giving the bit position within the byte.
            int bitShift = bitOffset % 8;

            // On the above, a good example to visualize the behavior:
            // If bitOffset = 10, byteOffset = 1 (skip 1 full byte) and bitShift = 2 (start at the 3rd bit in the second byte - we skip 2).

            // We now iterate through every byte in the pattern that is given to
            // us when the function is called.
            for (int i = 0; i < pattern.Length; i++)
            {
                // Get the data byte that alligns with the current
                // pattern byte and shifts the bits to the left by the
                // calculated bit shift value earlier.
                byte dataByte = (byte)(data[byteOffset + i] << bitShift);

                // If bitShift > 0, include bits from the next byte. This is
                // important for scenarios where, for example, we're shifting
                // by 3 bits, meaning that part of the data will come from the
                // next byte.
                if (byteOffset + i + 1 < data.Length && bitShift > 0)
                {
                    // Shifts the next byte to the right by the delta between 8
                    // and the calcualted bit shift value, aligning it with the
                    // remaining part of the data byte.
                    // Note: bitwise OR (|=) is used to combine the shifted parts
                    // so that we can perform a full byte comparison.
                    dataByte |= (byte)(data[byteOffset + i + 1] >> (8 - bitShift));
                }

                // Compare dataByte with the current byte in the pattern
                if (dataByte != pattern[i])
                {
                    // Not matching at position. No point in
                    // continuing.
                    return false;
                }
            }

            // All bits match
            return true;
        }
    }
}

Running this code will enable us to quickly detect the positions of data sequences that contain relevant information. For example, one of the observations about the film file is that we can spot XUID references by looking at the 0x2D 0xC0 pattern. If we use this pattern and run the tool across a set of film chunks we’ll see quite a few results:

Result of running the segment detection application.
Result of running the segment detection application.

How bit shifting works #

Before we go any further, though, let me explain a bit the “magic” of bit shifting that you might’ve noticed in the program above. Let’s say we have a data array like this:

Byte Index Hex Value Binary
0 0xAB 10101011
1 0xCD 11001101
2 0xEF 11101111
3 0x12 00010010

The pattern we want to look for is this:

Byte Index Hex Value Binary
0 0xCD 11001101
1 0xEF 11101111

Let’s pick a random bit offset - 10. that means that we’re starting at the 10th bit in the data array. If we look at the IsBitMatch function, it takes the bit offset as an argument.

That means that if we pass 10 as the value, we get a byteOffset of 1, meaning that we skip an entire byte (just one) when looking for the data.

Now, keep in mind that when calculating byteOffset it was not a “clean” division - we have a remainder, that is helpfully captured by bitShift, and that remainder is equal to 2, which means that with the byte at index 1 (remember, we skipped the one at 0), we start with the third bit (skip first two, as bitShift tells us).

That can be visualized in a table like this:

Byte Index Hex Value Binary Comment
0 0xAB 10101011 We’re skipping this entirely.
1 0xCD 11001101 We start comparing from the third bit.
2 0xEF 11101111 We’ll use the data from this bit to make sure we can build a full byte.
3 0x12 00010010 Used in comparison later.

Now, I mentioned that we start our parsing with the byte at index 1 at the third bit. Look at the binary representation for that byte:

11001101

We skip the first two bits, and shift the bits left, padding the “missing” bits with zeroes at the end:

00110100

Now, instead of using the zeroes, we can steal the two leading bits from the next byte in our sequence (at index 2 - that is, 0xEF). We shift it right by six bits to the right to get the top 2 bits (because that’s all we need to complete it), so that:

11101111

Becomes:

00000011

So now from the shifted bytes we have these two values:

00110100
00000011

Combining them gives us:

00110111

This binary value does not match the first value of our pattern (11001101), so the search will move on from the next offset, and so on.

Digging through the chunks #

So now that we have an idea on how to look for data we can start looking at individual “envelopes” that contain player details. As I mentioned above, there are many chunks that are usually provided for a given film; however, the ones that capture specific events, like deaths, kills, or medal awards, are all aggregated in the last film chunk file, with the ChunkType of 3.

Within the very last chunk (of type 3) the events are usually structured like this:

Header Gamertag (Unicode) Padding Type Timestamp Padding Medal Marker Padding Metadata (Medal Type)
12 bytes 32 bytes 15 bytes 1 byte 4 bytes 3 bytes 1 byte 3 bytes 1 byte

Be careful with assuming that a gamertag is unique within a match. There were cases where the same match had a gamertag like MyGamertag and another MsMyGamertag - you can’t search just for MyGamertag as that will produce some unexpected results. You need to check that there are 12 preceding bytes of “header” (arbitrary given that I don’t know what they represent, but consistent for individual gamertags) exist and then the headers before that are 0x00 (I limit to 3 zero bytes). That way you can ensure that you are extracting a properly offset event.

Some matches may not have a chunk of type 3 - that’s very likely a bug in the API. Without this chunk there is no timeline you can parse as easily. Additionally, it’s entirely possible that the chunk of type 3 doesn’t contain gamertag-associated data. Additional investigation is needed to understand that behavior.

If you are using a tool like 010 Editor and extract the binary data on a per-file basis (i.e., find the bit positions for the gamertag start and then extract the bytes into its own file from there), you can use the following extremely basic binary template to highlight the sequences for easier parsing:

struct HEADER
{
    char bytes[12];
};

struct GAMERTAG
{
    char bytes[32];
};

struct TYPE
{
    char bytes[1];
};

struct TIMESTAMP
{
    char bytes[4];
};

struct BUFF_PADDING
{
    char bytes[15];
};

struct PADDING
{
    char bytes[3];
};

struct MEDAL_MARKER
{
    char bytes[1];
};

local int offset = 0;

HEADER header <bgcolor=0x659157>;
offset += sizeof(HEADER);
FSeek(offset);

GAMERTAG gt <bgcolor=cGreen>;
offset += sizeof(GAMERTAG);
FSeek(offset);

BUFF_PADDING bp <bgcolor=cBlue>;
offset += sizeof(BUFF_PADDING);
FSeek(offset);

TYPE type <bgcolor=cYellow>;
offset += sizeof(TYPE);
FSeek(offset);

TIMESTAMP ts <bgcolor=cRed>;
offset += sizeof(TIMESTAMP);
FSeek(offset);

PADDING padding <bgcolor=cBlue>;
offset += sizeof(PADDING);
FSeek(offset);

MEDAL_MARKER mm <bgcolor=0xF7AF9D>;
offset += sizeof(MEDAL_MARKER);
FSeek(offset);

PADDING padding <bgcolor=cBlue>;
offset += sizeof(PADDING);
FSeek(offset);

MEDAL_MARKER mtype <bgcolor=0xFFC0CB>;
offset += sizeof(MEDAL_MARKER);
FSeek(offset);

The structure above is consistent across matches - I’ve extracted thousands of my own games and ran into minimal issues (with the exception of a few stray gamertags).

Extracting timeline metadata #

Out of all the fields above, the most interesting to me is the metadata one. The metadata field (i.e., the medal type) is capturing numeric values that represent medals. The values are different from the medal mapping. There is no clear mapping between those and a human-readable JSON representation, so we need to infer them by looking at medal volume here and correlate with medals earned per match or through a player’s career. Andy Curtis did the heavy lifting on this for some medals in his SPNKr project (a few are pending additional research).

The following medals are currently known:

Medal ID Medal
0 Double Kill
1 Triple Kill
2 Overkill
3 Killtacular
4 Killtrocity
5 Killamanjaro
6 Killtastrophe
7 Killpocalypse
8 Killionaire
9 Killing Spree
10 Killing Frenzy
11 Running Riot
12 Rampage
13 Perfection
26 Killjoy
27 Nightmare
28 Boogeyman
29 Grim Reaper
30 Demon
31 Flawless Victory
32 Steaktacular
36 Stopped Short
37 Flag Joust
38 Goal Line Stand
39 Necromancer
43 Ace
44 Extermination
45 Sole Survivor
46 Untainted
47 Blight
48 Disease
49 Plague
51 Pestilence
53 Culling
54 Cleansing
55 Purge
56 Purification
57 Divine Intervention
58 Zombie Slayer
59 Undead Hunter
60 Hell’s Janitor
61 The Sickness
62 Spotter
63 Treasure Hunter
64 Saboteur
65 Wingman
66 Wheelman
67 Gunner
68 Driver
69 Pilot
70 Tanker
71 Rifleman
72 Bomber
73 Grenadier
74 Boxer
75 Warrior
76 Gunslinger
77 Scattergunner
78 Sharpshooter
79 Marksman
80 Heavy
81 Bodyguard
82 Back Smack
83 Nuclear Football
84 Boom Block
85 Bulltrue
86 Cluster Luck
87 Dogfight
88 Harpoon
89 Mind the Gap
90 Ninja
91 Odin’s Raven
92 Pancake
93 Quigley
94 Remote Detonation
95 Return to Sender
96 Rideshare
97 Skyjack
98 Stick
99 Tag & Bag
100 Whiplash
101 Kong
102 Autopilot Engaged
103 Sneak King
104 Windshield Wiper
105 Reversal
106 Hail Mary
107 Nade Shot
108 Snipe
109 Perfect
110 Bank Shot
111 Fire & Forget
112 Ballista
113 Pull
114 No Scope
115 Achilles Spine
116 Grand Slam
117 Guardian Angel
118 Interlinked
119 Death Race
120 Chain Reaction
121 360
122 Combat Evolved
123 Deadly Catch
124 Driveby
125 Fastball
126 Flyin’ High
127 From the Grave
128 From the Void
129 Grapple-jack
130 Hold This
131 Last Shot
132 Lawnmower
133 Mount Up
134 Off the Rack
135 Quick Draw
137 Pineapple Express
138 Ramming Speed
139 Reclaimer
140 Shot Caller
141 Yard Sale
142 Special Delivery
146 Fumble
148 Straight Balling
151 Always Rotating
152 Hill Guardian
153 Clock Stop
154 Secure Line
156 Splatter
162 All That Juice
163 Great Journey
165 Breacher
166 Mounted & Loaded
167 Monopoly
168 Counter-snipe
174 Driving Spree
175 Death Cabbie
176 Immortal Chauffeur
177 Blind Fire
178 Hang Up
179 Call Blocked
180 Clear Reception

The event type, also captured in the envelope, can be one of the following:

Type (Decimal) Description
10 Mode-specific events (e.g., captured the flag, killed the carrier, stole the flag)
20 Death
50 Kill

Any other type identifier (such as 51, 100, or 250) that you may see here, when associated with a medal, is representative of the medal sorting weight. It maps 1:1 to the information that you can get from the medal metadata endpoint.

Timestamp data is represented in milliseconds from the start of the match. You can obtain a readable value with a C# snippet like this:

Array.Reverse(timestampBytes);
var timestamp = BitConverter.ToUInt32(timestampBytes, 0);

One thing that I haven’t yet figured out is how assists are tracked within the event batch. It’s likely captured as a XUID reference further in the event envelope that I didn’t get to. This will be a topic for another blog post in the future as we dig more through the film file format.

Finding the gamertags #

Notice that to extract all events from the last chunk one specific thing is still needed - we need to start with knowing the gamertags for which the events should be extracted. And because gamertags are technically arbitrary text, we need to find an index somewhere. To do that, we can look inside all other chunks (other than ones of type 3). That’s right, for us to get the list of gamertags that were involved in a given game we need to download and parse all existing film chunks other than the very last one that has ChunkType set to 3.

The last chunk contains information on all players in the game but doesn’t seem to contain a very clear XUID and Gamertag combination that will allow us to extract them cleanly. Luckily, inside all other chunks (where ChunkType is either 1 or 2), the gamertags and XUIDs can be found by looking at the pattern: 0x2D 0xC0. From that pattern, we can deduce the following structure:

Gamertag (Unicode) Padding XUID Marker 1 Marker 2
Dynamic length (32 bytes max) 21 bytes 8 bytes 0x2D 0xC0

Keep in mind that gamertags are stored as Unicode (UTF-16) text. This means that the padding can be deceiving if you are looking at the binary file - you might think that there are 22 0x00 bytes before the gamertag value, when in fact the last zero byte is just the trailing byte for the gamertag text. Make sure to be careful when parsing the values.

We can scan all film chunks for this pattern by identifying the markers, getting the XUID, checking that the preceding 21 bytes are 0x00 (padding zero bytes), and then grab 32 bytes of the gamertag data that can be parsed as a Unicode string. There are more safeguards we can put in place for this logic, but ultimately it’s good enough to extract the basic data.

Once the data is extracted into, say, a dictionary, we can use that as a starting point to look up gamertags in the final (summary) chunk.

As I mentioned earlier, depending on the matches that you are getting, some of them might not have a chunk with ChunkType equal to 3. Others can return HTTP 404 (blob does not exist) errors when attempting to download a chunk. The former may be a bug. The latter is likely caused by the folks at 343 occasionally cleaning up the storage from older matches.

In C#, the extraction logic can be formalized as such:

public static byte[] ExtractBitsFromPosition(byte[] data, int startBitPosition, int bitLength)
{
    // Calculate the actual end bit position
    int endBitPosition = startBitPosition + bitLength - 1;

    // Validate input parameters
    if (startBitPosition < 0 || endBitPosition < 0 || startBitPosition >= data.Length * 8 || endBitPosition >= data.Length * 8 || startBitPosition > endBitPosition)
    {
        throw new ArgumentOutOfRangeException("Bit positions are out of range or invalid.");
    }

    // Calculate the byte offset and bit shift for the start position
    int startByteOffset = startBitPosition / 8;
    int startBitShift = startBitPosition % 8;

    // Calculate the byte offset and bit shift for the end position
    int endByteOffset = endBitPosition / 8;
    int endBitShift = endBitPosition % 8;

    // Calculate the number of bytes to extract
    int byteCount = endByteOffset - startByteOffset + 1;

    // If there's no bit shift, we can return from the byte offset onward
    if (startBitShift == 0 && endBitShift == 0)
    {
        byte[] result = new byte[byteCount];
        Array.Copy(data, startByteOffset, result, 0, byteCount);
        return result;
    }

    // Otherwise, we need to shift the bits manually
    byte[] extractedData = new byte[byteCount];

    // Go byte by byte, shift and copy
    for (int i = 0; i < byteCount - 1; i++)
    {
        // Shift the current byte and take bits from the next byte if needed
        extractedData[i] = (byte)((data[startByteOffset + i] << startBitShift) | (data[startByteOffset + i + 1] >> (8 - startBitShift)));
    }

    // Handle the last byte (since it has no next byte to pull from)
    extractedData[byteCount - 1] = (byte)(data[startByteOffset + byteCount - 1] << startBitShift);

    // Mask the last byte to only include bits up to endBitShift
    extractedData[byteCount - 1] &= (byte)(0xFF >> (7 - endBitShift));

    return extractedData;
}

Recall that the data may or may not be byte-aligned so we need to operate on individual bits. In turn, once we find the marker pattern in film segment chunks (as we try to spot the gamertag and XUID combos), we can extract it with a function like this (where pattern is set to 0x2D 0xC0):

public static void ProcessData(byte[] data, byte[] pattern)
{
    List<int> patternPositions = FindPattern(data, pattern);

    foreach (int patternPosition in patternPositions)
    {
        int xuidStartPosition = patternPosition - 8 * 8;
        byte[] xuid = ExtractBitsFromPosition(data, xuidStartPosition, 8*8);
        var convertedXuid = ConvertBytesToInt64(xuid);

        if (convertedXuid != 0)
        {
            int prePatternPosition = xuidStartPosition - 21 * 8;
            var bytePrefixValidated = AreAllBytesZero(data, prePatternPosition, 21 * 8);

            if (bytePrefixValidated)
            {
                Console.WriteLine($"XUID: {convertedXuid}");
                byte[] undefinedData = ExtractBitsFromPosition(data, prePatternPosition - 32 * 8, 32 * 8);
                Console.WriteLine($"Undefined Data (until 0x00 0x00): {ConvertBytesToText(undefinedData)}");
            }
        }
    }

    Console.ReadLine();
}

To simplify how I extract the data, I built a tool called OpenSpartan/film-event-extractor which will let you log in with your Xbox Live ID and aggregate all match data within a local SQLite database. The entire parsing logic is very much in flux (feel free to follow the discussion on this), but once it stabilizes I can see integrating this better in OpenSpartan Workshop.

For my own account, having played more than seven thousand matches, the entire aggregation took around 48 hours. I haven’t yet optimized (and parallelized) the code, so this can be attributed to also me building a slower-than-needed tool, but it works for now and I can start analyzing the data.

The data that is available through the API is mostly good as-is, but an expanded dataset that accounts for film-based details enables me to see two things more clearly:

  • Mapping between gamertags and XUIDs at the time of the match (gamertags are mutable as users can change them, XUIDs are immutable). This way I don’t need to worry about doing out-of-band conversion to get an understanding of who I played against, since the match details API only returns XUIDs.
  • Times when specific events occur in-game. I can see how quickly I earn the first medal in the game, or how quickly I get to the first kill or death.

What’s next #

There are are a few improvements that I want to make to both the open-source tool that I built as well as to my understanding of the film files. I alluded to assists earlier - that’s a data point that I definitely want to cover. Additionally, film files may contain the data required for us to build heatmaps of map movement. For that, we need to better try and replicate behaviors in the game - that is, understand how binary data changes with movement, weapon switches, use of grenades, and so on. Something tells me it will be a much more protracted project than I initially anticipated 🤔